SHAX: The Semantic Historical Archive eXplorer
نویسندگان
چکیده
Newspaper archives are some of the richest historical document collections. Their study is, however, very tedious: one needs to physically visit the archives, search through reams of old, very fragile paper, and manually assemble cross-references. We present Shax, a visual newspaper-archive exploration tool that takes large, historical archives as an input and allows interested parties to browse the information included in a chronological or geographic manner so as to re-discover history. We used Shax on a selection of the Neue Zürcher Zeitung (NZZ)—the longest continuously published German newspaper in Switzerland with archives going back to 1780. Specifically, we took the highly noisy OCRed text segments, extracted pertinent entities, geolocation, as well as temporal information, linked them with the Linked Open Data cloud, and built a browser-based exploration platform. This platform enables users to interactively browse the 111906 newspaper pages published from 1910 to 1920 and containing historic events such as World War I (WWI) and the Russian Revolution. Note that Shax is neither limited to this newspaper nor to this time-period or language but exemplifies the power in combining semantic technologies with an exceptional dataset.
منابع مشابه
An Ontology-Based Archive for Historical Research
The digitalization of cultural materials is doubtless a key-enabler for increasing accessibility of cultural heritage documents, e.g., historical texts. In the last decade Semantic Digital Libraries (see, e.g., [1]) have attracted the attention of research communities coming from different research areas, such as Cultural Heritage, History, and Knowledge Engineering. In order to find more innov...
متن کاملPRiSMHA (Providing Rich Semantic Metadata for Historical Archives)
In this paper we present the PRiSMHA project, whose main goal is to demonstrate that a rich semantic representation of the content of historical documents is useful since it can significantly improve the access to archival resources and sustainable thanks to a crowdsourcing approach. This goal poses interesting research challenges, both for the semantic model definition and the user interaction...
متن کاملInterlinking current affairs with archives via the Semantic Web
The BBC has a very large archive of programmes, covering a wide range of topics. This archive holds a significant part of the BBC’s institutional memory and is an important part of the cultural history of the United Kingdom and the rest of the world. These programmes, or parts of them, can help provide valuable context and background for current news events. However the BBC’s archive catalogue ...
متن کاملXML and Knowledge Technologies for Semantic-Based Indexing of Paper Documents
Effective daily processing of large amounts of paper documents in office environments requires the application of semantic-based indexing techniques during the transformation of paper documents to electronic format. For this purpose a combination of both XML and knowledge technologies can be used. XML distinguishes between data, its structure and semantics, allowing the exchange of data element...
متن کاملTowards Semantic Enrichment of Newspapers: A Historical Ecology Use Case
Historical ecology research relies on historical accounts of human-animal interactions to study this interaction through space and time. Newspaper archives are a rich source of information, but require careful querying and filtering to collect the relevant information. Traditionally, this is a laborious manual task. In this position paper, we describe our ongoing work on semantically enriching ...
متن کامل